[SPARK-22602][SQL] remove ColumnVector#loadBytes by cloud-fan · Pull Request #19815 · apache/spark

cloud-fan · 2017-11-24T16:10:32Z

What changes were proposed in this pull request?

ColumnVector#loadBytes is only used as an optimization for reading UTF8String in WritableColumnVector, this PR moves this optimization to WritableColumnVector and simplified it.

How was this patch tested?

existing test

cloud-fan · 2017-11-24T16:10:49Z

cc @michal-databricks @kiszk @gatorsmile

SparkQA · 2017-11-24T18:57:20Z

Test build #84170 has finished for PR 19815 at commit ae7db88.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-24T19:03:59Z

Test build #84172 has finished for PR 19815 at commit 3a59b32.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2017-11-25T17:57:10Z

I will look this Sunday.

viirya · 2017-11-26T05:04:40Z

Shall we add comment that getUTF8String reuse the data in column vector? It seems different than other getXXX APIs.

viirya · 2017-11-26T05:12:33Z

hmm, but looks decodeToBinary will copy byte data?

That seems orthogonal to this issue. It would be nice if we could avoid the copy though. That would require some work on the dictionary code path.

kiszk · 2017-11-26T06:34:01Z

LGTM except one comment

gatorsmile · 2017-11-26T20:10:39Z

It looks risky if we do not make a copy.

If we plan to avoid the unnecessary data copy by this API, we should rename the API name getUTF8String and check all the callers whether they do not break the assumption.

This is a bit of a non-issue: the current on-heap code path already avoids making copies.

It might be a bit better to use arrayData() instead of childColumns[0], they are practically the same, but it makes the intent a bit clearer.

gatorsmile · 2017-11-26T20:12:13Z

Move it to

// // APIs dealing with Bytes //

Any better names?

hvanhovell · 2017-11-26T21:24:48Z

Same here, use arrayData().

gatorsmile · 2017-11-27T04:52:37Z

LGTM pending Jenkins.

SparkQA · 2017-11-27T05:43:35Z

Test build #84203 has finished for PR 19815 at commit 5711bb2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-11-27T05:49:28Z

Thanks! Merged to master.

cloud-fan force-pushed the load-bytes branch from ae7db88 to 3a59b32 Compare November 24, 2017 16:24

viirya reviewed Nov 26, 2017

View reviewed changes

gatorsmile reviewed Nov 26, 2017

View reviewed changes

hvanhovell reviewed Nov 26, 2017

View reviewed changes

remove ColumnVector#loadBytes

5711bb2

cloud-fan force-pushed the load-bytes branch from 3a59b32 to 5711bb2 Compare November 27, 2017 02:59

asfgit closed this in 5a02e3a Nov 27, 2017

Conversation

cloud-fan commented Nov 24, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Nov 24, 2017

Uh oh!

SparkQA commented Nov 24, 2017

Uh oh!

SparkQA commented Nov 24, 2017

Uh oh!

kiszk commented Nov 25, 2017

Uh oh!

viirya Nov 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiszk commented Nov 26, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Nov 27, 2017

Uh oh!

SparkQA commented Nov 27, 2017

Uh oh!

gatorsmile commented Nov 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

viirya Nov 26, 2017 •

edited

Loading